A Data Perturbation Approach to Privacy Protection in Data Mining
نویسندگان
چکیده
Advances in data mining techniques have raised growing concerns about privacy of personal information. Organizations that use their customers’ records in data mining activities are forced to take actions to protect the privacy of the individuals involved. A common practice for many organizations today is to remove the identity-reated attributes from customer records before releasing them to data miners or analysts. In this study, we investigate the effect of this practice and demonstrate that a majority of the records in a dataset can be uniquely identified even after identity related attributes are removed. We propose a data perturbation method that can be used by organizations to prevent such unique identification of individual records, while providing the data to analysts for data mining. The proposed method attempts to preserve the statistical properties of the data based on privacy protection parameters specified by the organization. We show that the problem can be solved in two phases, with a linear programming formulation in phase one (to preserve the marginal distribution), followed by a simple Bayes-based swapping procedure in phase two (to preserve the joint distribution). The proposed method is compared with a random perturbation method in classification performance on two real-world datasets. The results of the experiments indicate that it significantly outperforms the random method.
منابع مشابه
Modified Privacy Preserving Data Mining System for Improved Performance
Privacy of information and security issues now-a-days has become the requisite because of big data. A novel framework for extracting and deriving information when the data is distributed amongst the multiple parties is presented by Privacy Preserving Data Mining (PPDM). The concern of PPDM system is to protect the disclosure of information and its misuse. Major issue with PPDM that exists is to...
متن کاملPerformance Analysis of Clustering in Privacy Preserving Data Mining
Privacy is becoming an increasingly important issue in many data mining applications. This has triggered the development of many privacy preserving data mining techniques. A frequently used disclosure protection method is data perturbation. When used for data mining, it is desirable that perturbation preserves statistical relationships between attributes, while providing adequate protection for...
متن کاملA Study on Data Perturbation Techniques in Privacy Preserving Data Mining
Student, Dept. Of Computer Engineering, Grow More Faculty of Engineering Himatnagar, Gujarat, India Asst. Professor, Dept. of Computer Engineering, Grow More Faculty of Engineering Himatnagar, Gujarat, India ---------------------------------------------------------------------***--------------------------------------------------------------------Abstract-In recent years, the data mining techniq...
متن کاملA Privacy-Preserving Classification Method Based on Singular Value Decomposition
With the development of data mining technologies, privacy protection has become a challenge for data mining applications in many fields. To solve this problem, many privacy-preserving data mining methods have been proposed. One important type of such methods is based on Singular Value Decomposition (SVD). The SVD-based method provides perturbed data instead of original data, and users extract o...
متن کاملFeature Selection: A Preprocess for Data Perturbation
As a major concern in designing various data mining applications, privacy preservation has become a critical component seeking a trade-off between mining performances and protecting sensitive information. Data perturbation or distortion is a widely used approach for privacy protection. Many privacy preservation approaches were developed, either by adding noises or by matrix decomposition method...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004